Goto

Collaborating Authors

 ns 1


Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network

Kotoku, Shun, Mihana, Takatomo, Röhm, André, Horisaki, Ryoichi

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms.


Asymmetric leader-laggard cluster synchronization for collective decision-making with laser network

Kotoku, Shun, Mihana, Takatomo, Röhm, André, Horisaki, Ryoichi, Naruse, Makoto

arXiv.org Artificial Intelligence

Photonic accelerators [1] have been gaining attention in recent years, and a variety of implementations and applications have now been explored [2-9]. These advancements can be attributed to a growing awareness of the saturating speed of performance improvements in conventional computational systems [10], despite the soaring demands for information processing in an extensive range of applications, especially in machine learning. Reinforcement learning [11] is a subfield of machine learning that involves optimizing computer outputs or actions to maximize the reward function. Its applications are now essential to our daily lives, ranging from self-driving vehicles [12] and targeted advertising [13] to wireless networking [14], and there is now a strong demand for computational acceleration. Specifically, what we focus on here is decision-making.


Improved Convergence Guarantees for Shallow Neural Networks

Razborov, Alexander

arXiv.org Artificial Intelligence

We continue a long line of research aimed at proving convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Like in many previous works, our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances and network initialization, adversarial labels. It is more general in the sense that we allow both layers to be trained simultaneously and at {\em different} rates. Our results improve on state-of-the-art [Oymak Soltanolkotabi 20] (training the first layer only) and [Nguyen 21, Section 3.2] (training both layers with Le Cun's initialization). We also report several simple experiments with synthetic data. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the ``NTK regime''.


Gamification of Pure Exploration for Linear Bandits

Degenne, Rémy, Ménard, Pierre, Shang, Xuedong, Valko, Michal

arXiv.org Machine Learning

We investigate an active pure-exploration setting, that includes best-arm identification, in the context Since the early work of Robbins (1952), a great amount of of linear stochastic bandits. While asymptotically literature explores MAB in their standard stochastic setting optimal algorithms exist for standard multiarm with its numerous extensions and variants. Even-Dar et al. bandits, the existence of such algorithms for (2002) and Bubeck et al. (2009) are among the first to study the best-arm identification in linear bandits has the pure exploration setting for stochastic bandits. A nonexhaustive been elusive despite several attempts to address list of pure exploration game includes best-arm it. First, we provide a thorough comparison and identification (BAI), top-m identification (Kalyanakrishnan new insight over different notions of optimality in & Stone, 2010), threshold bandits (Locatelli et al., 2016), the linear case, including G-optimality, transductive minimum threshold (Kaufmann et al., 2018), signed bandits optimality from optimal experimental design (Ménard, 2019), pure exploration combinatorial bandits and asymptotic optimality.


Non-Asymptotic Pure Exploration by Solving Games

Degenne, Rémy, Koolen, Wouter M., Ménard, Pierre

arXiv.org Machine Learning

Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed bandit models with arms in an exponential family) reveal that the sample complexity is determined by the solution to an optimisation problem. The existing state of the art algorithms achieve asymptotic optimality by solving a plug-in estimate of that optimisation problem at each step. We interpret the optimisation problem as an unknown game, and propose sampling rules based on iterative strategies to estimate and converge to its saddle point. We apply no-regret learners to obtain the first finite confidence guarantees that are adapted to the exponential family and which apply to any pure exploration query and bandit structure. Moreover, our algorithms only use a best response oracle instead of fully solving the optimisation problem.